Goto

Collaborating Authors

 perfect generalization


Dynamics of Generalization in Linear Perceptrons

Neural Information Processing Systems

We study the evolution of the generalization ability of a simple linear per(cid:173) ceptron with N inputs which learns to imitate a "teacher perceptron". The system is trained on p aN binary example inputs and the generaliza(cid:173) tion ability measured by testing for agreement with the teacher on all 2N possible binary input patterns. The dynamics may be solved analytically and exhibits a phase transition from imperfect to perfect generalization at a 1. Except at this point the generalization ability approaches its asymptotic value exponentially, with critical slowing down near the tran(cid:173) sition; the relaxation time is ex (1 - y'a)-2. Right at the critical point, 1 the approach to perfect generalization follows a power law ex t - '2.


Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization

arXiv.org Artificial Intelligence

The notion of replicable algorithms was introduced in Impagliazzo et al. [STOC '22] to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis can facilitate the verification of published results by ensuring that the results of an analysis will be the same with high probability, even when that analysis is performed on a new data set. In this work, we establish new connections and separations between replicability and standard notions of algorithmic stability. In particular, we give sample-efficient algorithmic reductions between perfect generalization, approximate differential privacy, and replicability for a broad class of statistical problems. Conversely, we show any such equivalence must break down computationally: there exist statistical problems that are easy under differential privacy, but that cannot be solved replicably without breaking public-key cryptography. Furthermore, these results are tight: our reductions are statistically optimal, and we show that any computational separation between DP and replicability must imply the existence of one-way functions. Our statistical reductions give a new algorithmic framework for translating between notions of stability, which we instantiate to answer several open questions in replicability and privacy. This includes giving sample-efficient replicable algorithms for various PAC learning, distribution estimation, and distribution testing problems, algorithmic amplification of $\delta$ in approximate DP, conversions from item-level to user-level privacy, and the existence of private agnostic-to-realizable learning reductions under structured distributions.


Weight Priors for Learning Identity Relations

arXiv.org Machine Learning

Learning abstract and systematic relations has been an open issue in neural network learning for over 30 years. It has been shown recently that neural networks do not learn relations based on identity and are unable to generalize well to unseen data. The Relation Based Pattern (RBP) approach has been proposed as a solution for this problem. In this work, we extend RBP by realizing it as a Bayesian prior on network weights to model the identity relations. This weight prior leads to a modified regularization term in otherwise standard network learning. In our experiments, we show that the Bayesian weight priors lead to perfect generalization when learning identity based relations and do not impede general neural network learning. We believe that the approach of creating an inductive bias with weight priors can be extended easily to other forms of relations and will be beneficial for many other learning tasks.


Hidden Unit Specialization in Layered Neural Networks: ReLU vs. Sigmoidal Activation

arXiv.org Machine Learning

We study layered neural networks of rectified linear units (ReLU) in a modelling framework for stochastic training processes. The comparison with sigmoidal activation functions is in the center of interest. We compute typical learning curves for shallow networks with K hidden units in matching student teacher scenarios. The systems exhibit sudden changes of the generalization performance via the process of hidden unit specialization at critical sizes of the training set. Surprisingly, our results show that the training behavior of ReLU networks is qualitatively different from that of networks with sigmoidal activations. In networks with K >= 3 sigmoidal hidden units, the transition is discontinuous: Specialized network configurations co-exist and compete with states of poor performance even for very large training sets. On the contrary, the use of ReLU activations results in continuous transitions for all K: For large enough training sets, two competing, differently specialized states display similar generalization abilities, which coincide exactly for large networks in the limit K to infinity.


Examples of learning curves from a modified VC-formalism

Neural Information Processing Systems

We examine the issue of evaluation of model specific parameters in a modified VC-formalism. Two examples are analyzed: the 2-dimensional homogeneous perceptron and the I-dimensional higher order neuron. Both models are solved theoretically, and their learning curves are compared against true learning curves. It is shown that the formalism has the potential to generate a variety of learning curves, including ones displaying ''phase transitions."


Examples of learning curves from a modified VC-formalism

Neural Information Processing Systems

We examine the issue of evaluation of model specific parameters in a modified VC-formalism. Two examples are analyzed: the 2-dimensional homogeneous perceptron and the I-dimensional higher order neuron. Both models are solved theoretically, and their learning curves are compared against true learning curves. It is shown that the formalism has the potential to generate a variety of learning curves, including ones displaying ''phase transitions."


Examples of learning curves from a modified VC-formalism

Neural Information Processing Systems

We examine the issue of evaluation of model specific parameters in a modified VC-formalism. Two examples are analyzed: the 2-dimensional homogeneous perceptron and the I-dimensional higher order neuron. Both models are solved theoretically, and their learning curves are compared againsttrue learning curves. It is shown that the formalism has the potential to generate a variety of learning curves, including ones displaying ''phase transitions."


Dynamics of Generalization in Linear Perceptrons

Neural Information Processing Systems

We study the evolution of the generalization ability of a simple linear perceptron with N inputs which learns to imitate a "teacher perceptron". The system is trained on p aN binary example inputs and the generalization ability measured by testing for agreement with the teacher on all 2N possible binary input patterns. The dynamics may be solved analytically and exhibits a phase transition from imperfect to perfect generalization at a 1. Except at this point the generalization ability approaches its asymptotic value exponentially, with critical slowing down near the transition; the relaxation time is ex (1 - y'a)-2.


Dynamics of Generalization in Linear Perceptrons

Neural Information Processing Systems

We study the evolution of the generalization ability of a simple linear perceptron with N inputs which learns to imitate a "teacher perceptron". The system is trained on p aN binary example inputs and the generalization ability measured by testing for agreement with the teacher on all 2N possible binary input patterns. The dynamics may be solved analytically and exhibits a phase transition from imperfect to perfect generalization at a 1. Except at this point the generalization ability approaches its asymptotic value exponentially, with critical slowing down near the transition; the relaxation time is ex (1 - y'a)-2.


Dynamics of Generalization in Linear Perceptrons

Neural Information Processing Systems

We study the evolution of the generalization ability of a simple linear perceptron withN inputs which learns to imitate a "teacher perceptron". The system is trained on p aN binary example inputs and the generalization abilitymeasured by testing for agreement with the teacher on all 2N possible binary input patterns. The dynamics may be solved analytically and exhibits a phase transition from imperfect to perfect generalization at a 1. Except at this point the generalization ability approaches its asymptotic value exponentially, with critical slowing down near the transition; therelaxation time is ex (1 - y'a)-2.